1,486 research outputs found
Vision-based Detection of Acoustic Timed Events: a Case Study on Clarinet Note Onsets
Acoustic events often have a visual counterpart. Knowledge of visual
information can aid the understanding of complex auditory scenes, even when
only a stereo mixdown is available in the audio domain, \eg identifying which
musicians are playing in large musical ensembles. In this paper, we consider a
vision-based approach to note onset detection. As a case study we focus on
challenging, real-world clarinetist videos and carry out preliminary
experiments on a 3D convolutional neural network based on multiple streams and
purposely avoiding temporal pooling. We release an audiovisual dataset with 4.5
hours of clarinetist videos together with cleaned annotations which include
about 36,000 onsets and the coordinates for a number of salient points and
regions of interest. By performing several training trials on our dataset, we
learned that the problem is challenging. We found that the CNN model is highly
sensitive to the optimization algorithm and hyper-parameters, and that treating
the problem as binary classification may prevent the joint optimization of
precision and recall. To encourage further research, we publicly share our
dataset, annotations and all models and detail which issues we came across
during our preliminary experiments.Comment: Proceedings of the First International Conference on Deep Learning
and Music, Anchorage, US, May, 2017 (arXiv:1706.08675v1 [cs.NE]
The Neecham Confusion Scale and the Delirium Observation Screening Scale: Capacity to discriminate and ease of use in clinical practice
BACKGROUND: Delirium is a frequent form of psychopathology in elderly hospitalized patients; it is a symptom of acute somatic illness. The consequences of delirium include high morbidity and mortality, lengthened hospital stay, and nursing home placement. Early recognition of delirium symptoms enables the underlying cause to be diagnosed and treated and can prevent negative outcomes. The aim of this study was to determine which of the two delirium observation screening scales, the NEECHAM Confusion Scale or the Delirium Observation Screening (DOS) scale, has the best discriminative capacity for diagnosing delirium and which is more practical for daily use by nurses. METHODS: The project was conducted on four wards of a university hospital; 87 patients were included. During 3 shifts, these patients were observed for symptoms of delirium, which were rated on both scales. A DSM-IV diagnosis of delirium was made or rejected by a geriatrician. Nurses were asked to rate the practical value of both scales using a structured questionnaire. RESULTS: The sensitivity (0.89 – 1.00) and specificity (0.86 – 0.88) of the DOS and the NEECHAM were high for both scales. Nurses rated the practical use of the DOS scale as significantly easier than the NEECHAM. CONCLUSION: Successful implementation of standardized observation depends largely on the consent of professionals and their acceptance of a scale. In our hospital, we therefore chose to involve nurses in the choice between two instruments. During the study they were able to experience both scales and give their opinion on ease of use. In the final decision on the instrument we found that both scales were very acceptable in terms of sensitivity and specificity, so the opinion of the nurses was decisive. They were positive about both instruments; however, they rated the DOS scale as significantly easier to use and relevant to their practice. Our findings were obtained from a single site study with a small sample, so a large comparative trial to study the value of both scales further is recommended. On the basis of our experience during this study and findings from the literature with regard to the implementation of delirium guidelines, we will monitor the further implementation of the DOS Scale in our hospital with intensive consultation
Single Shot Temporal Action Detection
Temporal action detection is a very important yet challenging problem, since
videos in real applications are usually long, untrimmed and contain multiple
action instances. This problem requires not only recognizing action categories
but also detecting start time and end time of each action instance. Many
state-of-the-art methods adopt the "detection by classification" framework:
first do proposal, and then classify proposals. The main drawback of this
framework is that the boundaries of action instance proposals have been fixed
during the classification step. To address this issue, we propose a novel
Single Shot Action Detector (SSAD) network based on 1D temporal convolutional
layers to skip the proposal generation step via directly detecting action
instances in untrimmed video. On pursuit of designing a particular SSAD network
that can work effectively for temporal action detection, we empirically search
for the best network architecture of SSAD due to lacking existing models that
can be directly adopted. Moreover, we investigate into input feature types and
fusion strategies to further improve detection accuracy. We conduct extensive
experiments on two challenging datasets: THUMOS 2014 and MEXaction2. When
setting Intersection-over-Union threshold to 0.5 during evaluation, SSAD
significantly outperforms other state-of-the-art systems by increasing mAP from
19.0% to 24.6% on THUMOS 2014 and from 7.4% to 11.0% on MEXaction2.Comment: ACM Multimedia 201
- …